o 









Vapnik-Chervonenkis Dimension of Axis- Parallel 

Cuts 

Servane Gey* 
February 24, 2013 

Abstract 



Uh The Vapnik-Chervonenkis (VC) dimension of the set of half-spaces 

^^ of R'^ with frontiers parallel to the axes is computed exactly. It is 

shown that it is much smaller than the intuitive value of d. A good 
approximation based on the Stirling's formula proves that it is more 
likely of the order log2 d. 

Ch This result may be used to evaluate the performance of classifiers or 

regressors based on dyadic partitioning of M.'^ for instance. Algorithms 
^S| using axis-parallel cuts to partition M'' are often used to reduce the 

K*" computational time of such estimators when d is large. 

cn 
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S 1 Introduction 

L| The VC dimension of a set of subsets has been introduced by Vapnik and 

• 'H Chervonenkis [9l |10] to measure its complexity. The VC dimension of a real- 

rN valued function space J^ is then the VC dimension of {{x; /(x) ^ 0}; f € T}. 

C^ In particular, the VC dimension of sets of classifiers or regressors appears 

commonly in the statistical learning area when evaluating their performance. 

For example, Vapnik's theory in the classification framework is now widely 

known (see [3] for instance): let {X, Y) be a couple of variables taking values 

in M X {0; 1}, and let £ be a sample of n independent replications of {X, Y). 

If / is a classifier minimizing the average misclassification rate of £ on a set 
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of classifiers liaving finite VC dimension V, tlien, without further assump- 
tion on the distribution P of {X,Y), the performance of / is evaluated as 
follows: 



E/ 



P(/(X)/y)| <^C\Mas\f) + C2^^, (1) 



where E^ denotes the expectation with respect to the sample distribution, 
bias{f) denotes the bias of the classifier /, and Ci and C2 are absolute con- 
stants. 

Functional estimates defined on partitions of M'^ are often used to esti- 
mate relationships between two variables X G M and Y G {0; 1} or y G M 
(such as histograms, piecewise polynomials, or splines for example). In many 
cases, the VC dimension of the set of subsets used to construct the partition 
appears inside risk bounds when evaluating the performance of such estima- 
tors. For example, if the set used is the set of all half-spaces of M'^, often its 
VC dimension d+ 1 has to be taken into account. 

When d is large, it is often computationally easier to construct partitions us- 
ing axis-parallel cuts. For example, some theoretical developments on dyadic 
partitions of M? are given in [H |T] , and the VC dimension of axis-parallel 
cuts appears more particularly in the results obtained on the performance 
of classification and regression binary decision trees (CART) introduced by 
Breiman et. al [2\ in 1984, and theoretically studied in |8l El [5l [6] . 

2 Reminder about VC Dimension 

The VC dimension of a set A of subsets of some measurable space X is based 
on counting the number of intersects of A with a finite set of fixed points in 
X. 

Definition 1 (Vapnik-Chervonenkis Dimension). Let A be a set of 

subsets of some measurable space X . Then (xi, . . . ,x„) G X^ will be said to 
be shattered by A if all subsets of {xi; . . . ; x„} are covered by A, that is if 
\{{xi,...,Xn}nA ; Ag^}|=2". 

The Vapnik-Chervonenkis dimension VC{A) of A is then defined as the 
maximal integer n such that there exists n points in X shattered by A, i.e. 



VC{A) = raayi{n \ max \{{xi, . . . ,Xn] <r^ A ] Ae A]\ =2''' 
y {xi,...,x„)eA'" 

If no such n exists, then VC{A) = +00. 



Thus, it is easily seen that the larger VC{A), the more complex A. 

For example, if ^ = {] — oo; x] ; x E M}, then VC{A) = 1; or if ^ is the set 
of all half-spaces in R'^, then VC{A) = d + l. 

Since axis-parallel cuts is a subset of the set of all half-spaces in M , it could 
be natural to think that its VC dimension is of order d. Actually, it is shown 
in what follows that it is of order log2 d. 

3 VC Dimension of axis-parallel cuts 

We give a formula to compute the VC dimension of axis-parallel cuts in M.'^. 
Since the obtained formula is not always easy to handle, an approximation 
is also given. 

Lemma 1. Let 

Ad = Ux £R'^ ; x' ^ a}; i = 1, . . . , d , a G m| . 

Then 

VC{Ad) = max i n ; ( . /2 1 ) ^ ^ 

where [x\ denotes the integer part of x. 

Furthermore, the following approximation of VC{Ad) is available for all 

d^ 2: 

log d , . , log (dVd + 3) 

-^ - 0.38 ^ VC{Ad) ^ ^\\ ^ + 0.51. 

log 2 log 2 

Figure [I] shows that VC{Ad) is a piecewise constant function of the space 
dimension d, which increases at a rate much smaller than the intuitive value 
of d. It also shows that the bounds computed from the Stirling's formula 
are sharp. 

Proof. The idea is that, to have n points (xi, . . . , x^) shattered by Ad, all the 
subsets of {xi, . . . , Xn} should be covered by Ad- But, if there exists p ^ n 
such that there is more than d + 1 subsets of {xi, . . . , x„} having p elements, 
then Ad will miss at least (") — d subsets: let n ^ 1 and (xi, . . . ,x„) be 

n points in R'^. Suppose that n is such that I ] > d. This means 

VLn/2j; 

that there are at least d+1 subsets of {xi, . . . , Xn} of size [n/2j . For each 
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Figure 1: VC(^Ad) with respect to the space dimension d and Stirling's bounds. 



coordinate i = 1, . . . , d, let us denote by Xj(,) the ordered statistic computed 
from the i^^ coordinate of (xi, . . . , x„,), that is, for all i = 1, . . . , d, 



^i(l) ^ ^i{2) ^ •• • ^ ^%{ny 



Let p = [n/2j and let 






{{xi(X)\ • • • ; 2;j(p)} ; « = l, . . . , d and |{xj(i); . . . ; Xj(p)}| = p} 
{B C {xi, . . . , Xn}\ \B\ =pmidB ^ Bp} . 



Hence Bp is covered by Ad (by simply taking A = {x* ^ {^\{p) + ^i(p+i))/^} 
for each coordinate), and we have that: 



\Bp\ ^ d and |e;;| ^ 



n 



P 



d>0. 



Let B £ Bp and A = {x* ^ a} G ^rf. If |{xi,...,x„} n A| ^ p, then 
{xi, . . . ,Xn}riA 7^ i?. Else, since {xi, . . . ,Xn}riA = {xj ; x*- ^ a}, we have 
that x*(. -N ^ a for all j = 1, . . . ,p, and x*. .n > a for all j = p + 1, . . . , n. So 
{xi,...,Xn} n A = {xi(i);...;Xi(p)} and |{xi(i); . . . ; Xi(p)}| = p, leading to 
{xi, . . . ,Xn} n A G Bp, and then to {xi, . . . , x^} CiAj^B. So, for all B £ Bp 



and all A G Ad, {xi, . . . ,x„} Ci A j^ B. 

So, if I , ] > d, (xi, . . . , Xn) can not be shattered by Ad- Thus 

VLn/2j; 

VC{Ad) ^ max in; ( . ^^2 1 ) ^ "^ 



Let n ^ 1 such that { ] ^ d. Let (xi, . . . , x„) be n points of M'^ defined 

as follows: for each coordinate i = 1, . . . , (|„/2|); l^t {ii; . . . ;i[„/2j} be the 
i subset of [n/2j indices in {1; . . . ;n}, where the indices are denoted in 
ascending order, i.e.: 

1 ^ ii < ... < iin/2\ ^ n. 

Since ( , , , | ^ d, we obtain ( , , , | distinct subsets of indices. 
Hence we take for each such coordinate 

< = ^- 
Then the remaining values of {xi, . . . , x„) are taken as follows: 

• Since ( /2| + i ) ^ '^' foi" each subset {h; ■ ■ ■ ■,iin/2\+i} of {l;...;n} 
with [n/2\ + 1 elements, there exists i' G {1; . . . ; {\^/2\)} such that 
{ii; . . .■,iin/2\} = {i'l, ■ ■ • ;«'l„/2j}- Then take x'-^^^^^^^ = [n/2j +1. Let 
us note that, if n is odd, there is a bijection between i and i' . 

• Let {ji; . . . ;im} = {j ^ {h; ■ ■ .; i[„/2j+i}}, with ji < . . . < j^, and 
let jo = iln/2\+i- Then take x*'^ = x*'^_^ + 1. 

If not filled, the last coordinates are set to be equal to n. 

Hence, we obtain that, for all j ^ {ii; . . . ;i[„/2j}, x*- ^ [n/2\ + 1. 

Then (xi, . . . ,x„) is shattered by ^j^: forp G {0; . . . ;n}, let B = {xj^; . . . ;xip} C 

{xi, . . . , Xn}, with 1 ^ ii < i2 < ■ ■ ■ < ip ^ n as soon as p 7^ 0. 

Ifp = 0, let 

iQ = argmin]^<j<j^minx*, 

and take A = {x*o ^ min^ x*" - 1}. Then B = {xi, . . . , x„} n A = 0. 

U p = n, let 

in = argmaX]^<j<^maxx*, 
j 



and take A = {x*" ^ maxj x*" + 1}. Then B = {xi,...,x„} n A = 

\Xi, • • • J Xfij. 

If < p ^ [n/2j , let A G Ad be the subset defined by A = {x* ^ p + 1/2}, 
with i the coordinate corresponding to a subset of indices {ii; . . . ; i[n/2j } con- 
taining {ii; . . . ;ip}. Then, by definition of {x\, . . . ,xjj), B = {xi, . . . ,x„} n 
A. 

If [n/2j + 1 ^ p < n, let i' be the coordinate corresponding to the config- 
uration {ii; . . . ; i|^„/2j+i} (^-s defined by (xi, . . . , x„)). Let ^ G ^^ be the 
subset defined by ^ = {x* ^ p + 1/2}. Then, by definition of {x\ , . . . ,x\), 

B = {xi,... ,x„} n^. 

Thus 

VC{Ad) ^ max i n ; ( . '^2 1 ) ^ "^ 



Then, the lower and upper bounds of VC{Ad) are computed by using the 
Stirling's formula: for all n ^ 1 we have 

^/2^e-("+i)(n + l)"+5 ^n! ^ ^/2^e-("+i)eT^^(^(n + l)"+i 

A simple calculation gives the following: if n is even, then 

n\ 1 e „,i(n+l)"+V2 1+3^ 

< gi2(n+i) 2 ''^ < on+i 



,n/27 " V2^ (n + 2)«+i " ^6^ 

and if n is odd, then 

[n/2\J ^ V2^ („ + 3)n/2+i - ^/2^ • 



Thus, if 



i+- 
e 24 , , 

2"+i ^ d, 



then I , , , I < d. Taking the logarithm leads to the lower bound of 

VCiAd). 

On the other hand, if ( , , , | ^ d, we have that, if n is even, 

n\ i_ e „ , 1 (n + 1)''+V2 -^ ^ 

> e 3n + 6 2" + ^^ — — —> 2"+^ 



n/2j ^ V27r (n + 2)"+i ^^/2tt Vd + 2' 



and if n is odd, 

n \ n+2 p , ^r) + 1')("+i)/2 p-~\ , 1 

/ 77 \ P 8 

Thus, since, for all n such that I ) ^ d, 2""^ ^ d, the upper 



ln/2jy " V2^ 
bound of FC(^(i) is found by taking the logarithm of this last expression. 
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